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Dedicated to our dear colleague and friend Henryk Wozniakowski on 

the occasion of his 60th birthday. 

Abstract. We study the integration of functions with respect to an unknown 
density. Information is available as oracle calls to the integrand and to the non- 
normalized density function. We are interested in analyzing the integration error 
of optimal algorithms (or the complexity of the problem) with emphasis on the 
variability of the weight function. For a corresponding large class of problem in- 
stances we show that the complexity grows linearly in the variability and the 
simple Monte Carlo method provides an almost optimal algorithm. Under addi- 
tional geometric restrictions (mainly log-concavity) for the density functions, we 
establish that a suitable adaptive local Metropolis algorithm is almost optimal 
and outperforms any non-adaptive algorithm. 

1. Introduction, Problem description 
In many applications one wants to compute an integral of the form 

(1) / f(x) ■ cg(x)fi(dx) 

Jn 

with a density cg(x), i6 0, where c > is unknown and fi is a probability measure. 
Of course we have 1/c = L g(x) fi(dx), but the numerical computation of the latter 
integral is often as hard as the original problem (Q. Therefore it is desirable to 
have algorithms which are able to approximately compute ([1]) without knowing the 
normalizing constant, based solely on n function values of / and g. In other terms, 
these functions are given by an oracle, i.e., we assume that we can compute function 
values of / and g. 

Solution operator. Assume that we are given any class !F(Q) of input data (/, g) 
defined on a set fl We can rewrite the integral in ([1]) as 

to\ cr* \ f f(x) ■ g(x) fi(dx) 

(2) S{f > 0) = f o(x) u(dx) ' (/>*)e^(n). 

This solution operator is linear in / but not in g. We discuss algorithms for the 
(approximate) computation of S(f, g). 

Remark 1. This solution operator is closely related to systems in statistical me- 
chanics, which obey a Boltzmann (or Maxwell or Gibbs) distribution, i.e., when 
there is a countable number j = 1, 2, . . . of microstates with energies, say Ej, and 
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the overall system is distributed according to the Boltzmann distribution, with in- 
verse temperature j3, as 

-PE S 



e 



Pp{j) :=-=-, J = 1,2, 



Zp 



In this case the normalizing constant Zp is the partition function, corresponding to 
1/c from (dJ) and g^{j) = e~ f3E ^ for j e N. 

In this setup, if A is any global thermodynamic quantity, then its expected value 
(A)j3 is given by 



Zp 



j 

which can be written as S(A, gr). Observe, however, that we use here slightly differ- 
ent assumptions since we use the counting measure on N, not a probability measure. 

Randomized methods. Monte Carlo methods (randomized methods) are important 
numerical tools for integration and simulation in science and engineering, we refer 
to the recent special issue [7] . The Metropolis method, or more accurately, the class 
of Metropolis-Hastings algorithms ranges among the most important methods in 
numerical analysis and scientific computation, see [SI [23] . 

Here we consider randomized methods S n that use n function evaluations of / 
and g. Hence S n is of the form as exhibited in Figure [TJ 



Algorithm: S n (f, g) 

Data: Functions /, g, random numbers uii, . . . , u> n ; 

Result: approximate value S n (f, g) for S(f, g) from Eq. (T5]); 

begin 

Init x\ := x\{uj\), Compute f(x%) and g(xi); 
for i = 2, . . . , n do 

Step Xi := Xi(f(xi), f(xi-i), g(x x ), g(x i ^ 1 ),uj i )] 
Compute f(xi) and g(xi); 
end 

Compute S n (f, g) = ip n (f(x l ), f(x n ), g{x x ), g{x n )) E R; 
end 



Figure 1. Generic Monte Carlo algorithm based on n values of / 
and q. The final Compute may use any mapping tp n : IR 2n — > M.. 

In all steps, random number generators may be used to determine the consecutive 
node. If the nodes Xi from Step do not depend on previously computed values of 
f(x\), . . . , f(xi-i) and g(xx), . . . , g(xi-i), then the algorithm is called non-adaptive, 
otherwise it is called adaptive. Specifically we analyze the procedures 5'^ imple and 
S™ h , introduced in ([3]) and (jSJ) below. 

Remark 2. The notion of adaption which is used here differs from the one recently 
used to introduce adaptive MCMC, see e.g. [U(3]. The Metropolis algorithm which 
is used in this paper is based on a homogeneous Markov chain, in our notation this 
is still an adaptive algorithm since the used nodes Xj depend on g. Hence we use the 
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concept of adaptivity from numerical analysis and information-based complexity, 
see [22]. 

For details on the model of computation we refer to [201 [2B [21] • Here we only 
mention the following: We use the real number model and assume that / and g are 
given by an oracle for function values. Our lower bounds hold under very general 
assumptions concerning the available random number generator^ 

For the upper bounds we only study two algorithms in this paper, described in ([3]) 
and (jSJ), below. Specifically we shall deal with the (non-adaptive) simple Monte Carlo 
method and a specific (adaptive) Metropolis-Hastings method. The former can only 
be applied if a random number generator for fi on Q is available. Thus there are 
natural situations when this method cannot be used. The latter will be based on 
a suitable ball walk. Hence we need a random number generator for the uniform 
distribution on a (Euclidean) ball. Thus the Metropolis Hastings methods can also 
be applied when a random number generator for \x on Q is not available. Instead, 
we need a "membership oracle" for Q: On input x G M d this oracle can decide with 
cost 1 whether i 6 O or not. 

Error criterion. We are interested in error bounds uniformly for classes of 
input data. If S n is any method that uses (at most) n values of / and g then the 
(individual) error for the problem instance (/, g) G is given by 

e(S n ,(f,g))= (E\S(f,g)-S n (f,g)\ 2 ) 1/ \ 

where E means the expectation. The overall (or worst case) error on the class ^(Q) 
is 

e(S n ,J r (Q))= sup e(S n ,(f,g)). 

(/,tf)6JF(0) 

The complexity of the problem is given by the error of the best algorithm, hence we 
let 

e n {T{9)) :=w£e(S n ,F(n)). 

The classes JF(fi) under consideration will always contain constant densities g = c > 
and all / with \\f\\oo < 1> hence 

^i(ft) := {(/, g), \f(x)\ < 1, x G n, and g = c} C 

On this class the problem ([2D reduces to the classical integration problem for uni- 
formly bounded functions, and it is well known that the error of any Monte Carlo 
method can decrease at a rate n _1//2 , at most. Precisely, it holds true that 

e n (^(fi)) = _L=, 
1 + y/n 

if the probability fi is non-atomic, see [17J. On the other hand we will only consider 
(/, g) with S(f, g) G [—1,1], hence the trivial algorithm So = always has error 1. 

For the classes Tc{Q) arid T a (Q), which will be introduced in Section [21 we easily 
obtain the optimal order e n (J-(Q,)) x n~ x l 2 . We will analyze how e n (jF(fi)) depends 
on the parameters C and a, in case J-{Vt) := jF c (fi) or T{Vt) := jF Q (fi), respectively. 

^Observe, however, that we cannot use a random number generator for the "target distribution" 
Me = f?" /VIMIi; since g is part of the input. 
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We discuss some of our subsequent results and provide a short outline. In Section[2] 
we shall specify the methods and classes of input data to be analyzed. The classes 
TciS^i analyzed first in Section [3], contain all densities g with sup gj inf g < C. In 
typical applications we may face C = 10 20 . Then we cannot decrease the error of 
optimal methods from 1 to 0.7 even with sample size n = 10 15 , see Theorem 1 for 
more details. Hence the classes are so large that no algorithm, deterministic 

or Monte Carlo, adaptive or non-adaptive, can provide an acceptable error. We also 
prove that the simple (non-adaptive) Monte Carlo method is almost optimal, no 
sophisticated Markov chain Monte Carlo method can help. 

Thus we face the question whether adaptive algorithms, such as the Metropolis 
algorithm, help significantly on "suitable and interesting" subclasses of We 
give a positive answer for the classes ^(Q), analyzed in Section HI Here we assume 
that Q C M. d is a convex body, and that /i is the normalized Lebesgue measure /in on 
Q. The class J-" a (Q) contains logconcave densities, where a is the Lipschitz constant 
of log q. We shall establish in § 14. II that all non-adaptive methods (such as the simple 
Monte Carlo method) suffer from the curse of dimension, i.e., we get similar lower 
bounds as for the classes .Fc^fl). However, in § 14.21 we shall design and analyze 
specific (adaptive) Metropolis algorithms that are based on some underlying ball 
walks, tuned to the class parameters. Using such algorithms we can break the curse 
of dimension by adaption. The main error estimate for this algorithm is given in 
Theorem and we conclude this study with further discussion in the final Section 

2. Specific methods and classes of input 

We consider the approximate computation of S(f, g) for large classes of input 
data. Since with deterministic algorithms one cannot improve the trivial zero algo- 
rithm (with error 1), we study randomized or Monte Carlo algorithms. 

The methods. The Monte Carlo methods under consideration fit the schematic 
view from Figure [TJ 

Simple Monte Carlo. Here the random numbers u\, . . . ,cu n are identically and in- 
dependently distributed according to //, and the routine Step chooses Xi := u>i. 
The final routine Compute is the quotient of the sample means of the computed 
function values 



(3) Sr^(f,g):- 



Tr 3 =J{X 3 )g{Xj, 



Metropolis-Hastings method. This describes a class of (adaptive) Monte Carlo meth- 
ods which are based on the ingenious idea to construct in Step a Markov chain 
having 

(4) •= Q -/i 

Q J g(x) fi(dx) 

as invariant distribution without knowing the normalization. Thus, if (X 1; X 2 , . . . , X n ] 
is a trajectory of such a Markov chain, then we let Compute be given as 

1 n 

(5) S-\f,g):=-J2f(X J ). 

3=1 
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Hence we use n steps of the Markov chain, the number of needed (different) function 
values of g and / might be smaller. We will further specify the Metropolis-Hastings 
algorithm for the problem at hand in § 14.21 see Figures 2 and 3 for a schematic 
presentation and Theorem 5 for the choice of S. Both Monte Carlo methods construct 
Markov chains, i.e., the point Xj depends on and g{xi-\), only. This trivially 
holds true for simple Monte Carlo, since Xi does not at all depend on earlier computed 
function values. 

Remark 3. Comparisons of different Monte Carlo methods for problems similar 
to ([2]) are frequently met in the literature. We mention [5j with a comparison of 
Metropolis algorithms and importance sampling, where an error expansion at any 
instance (/, g) is given in terms of certain auto-correlations. The simple Monte Carlo 
method, as introduced below, is also studied there as fli for g = 1. 

The (point-wise almost sure) convergence of both methods S^ implc and 5*™ h , as 
n — > oo, is ensured by corresponding ergodic theorems, see [TJ]. But, as outlined 
above, we are interested in the uniform error on relatively large problem classes. 

The classes. Here we formally describe the classes of input under consideration. 

The class Tc{&)- Let \i be an arbitrary probability measure on a set Q and consider 
the set 

^c(n) = {(/,*?) I ||/||oo<l, g>0, ^-<C, x,y e O}. 

QW 

Note that necessarily C > 1. If C — 1 then g is constant and we almost face the 
ordinary integration problem, since g can be recovered with only one function value. 

In many applications the constant C is huge and we will establish that the com- 
plexity of the problem (the cost of an optimal algorithm) is linear in C. Therefore, 
for large C, the class is too large. We have to look for smaller classes that contain 
many interesting pairs (/, g) and have smaller complexity. 

The class J-" a (Q) with log-concave densities. In many applications, we have a weight g 
with additional properties and we assume the following: 

• The set Q C M d is a convex body, that is a compact and convex set with 
nonempty interior. The probability fi = ^ is the normalized Lebesgue mea- 
sure on the set Q. 

• The functions / and g are defined on Q. 

• The weight g > is log- concave, i.e., 

g(Xx + (l-X)y) > g(x) x ■ g^y) 1 ^, 

where x, y G Q and < A < 1. 

• The logarithm of g is Lipschitz, i.e., | logg(x) — log g(y)\ < a\\x — y\\2- 
Thus we consider the class of log-concave weights on Q C M d given by 

(6) 1Z a {VL) = {g | g > 0, log g is concave, \ log g(x) —log g(y)\ < a\\x — y\\ 2 }. 
We study the following class T a {VL) of problem elements, 

(7) f a (n) = {(f,g)\gen a (n), ||/|| 2 ,,<i}, 

where || ■ \\2, s is the L 2 -norm with respect to the probability measure see OH). In 
some places we restrict our study to the (Euclidean) unit ball, i.e., fl := B d C M. d . 
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Remark 4. Let TZc{&) be the class of weight functions that belong to Tc{^)- Then 
7Z a (Q) C 1Zc{&) if C = e aD , where D is the diameter of Q. Thus large a correspond 
to "exponentially large" values of C. However, the densities from the class TZ a (Q) 
have some extra (local) properties: they are log-concave and Lipschitz continuous. 
These properties can be used for the construction of fast adaptive methods, via 
rapidly mixing Markov chains. 



We assume that is an arbitrary set and /i is a probability measure on f2, and 
that the functions / and g are defined on Q. 

In the applications, the constant C might be very large, something like C = 10 20 
is a realistic assumption. Therefore we want to know how the complexity (the cost of 
optimal algorithms) depends on C. Observe that the problem is correctly normalized 
or scaled such that S^city) — [ — 1, 1], for any C > 1. We will prove that the 
complexity of the problem is linear in C, and hence there is no way to solve the 
problem if C is really huge. We start with establishing a lower bound and then show 
that simple Monte Carlo achieves this error up to a constant. 

3.1. Lower Bounds. Here we prove lower bounds for all (adaptive or non-adaptive) 
methods that use n evaluations of / and g. We use the technique of Bahvalov, i.e., we 
study the average error of deterministic algorithms with respect to certain discrete 
measures on FciVL). 

Theorem 1. Assume that we can partition Q into In disjoint sets with equal measure 
(equal to l/2n). Then for any Monte Carlo method S n that uses n values of f and 
g we have the lower bound 



The lower bound will be obtained in two steps. 

(1) We first reduce the error analysis for Monte Carlo sampling to the average 
case error analysis with respect to a certain prior probability on the class 
J-'ciP)- This approach is due to Bahvalov, see [I]. 

(2) For the chosen prior the average case analysis can be carried out explicitly 
and will thus yield a lower bound. 

To construct the prior let m := In and Qi, . . . , Q m the partition into sets of equal 
probability, and xn- the corresponding characteristic functions. Furthermore, let 



Denote J/ 11 the set of all subsets of {1, . . . , m} of cardinality equal to I, and \i m ^ 
the equi-distribution on J™, while E m j denotes the expectation with respect to the 
prior /i m> «. Let (e 1; . . . ,e m ) be independent and identically distributed with P(Ej = 
— 1) = P(sj = 1) = 1/2, j = 1, . . . ,m. The overall prior is the product probability 



3. Analysis for ^c(^) 





2n > C - 1, 
2n < C - 1. 




m > C- 1, 
else. 
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on JJ™ x {±l} m . For any realization to — (I, ei, . . . , e m ) we assign 

u ■■= e i xn 3 and & := c xn i + Xa i ■ 

The following observation is useful. 

Lemma 1. For any subset N C {1, . . . , m} of cardinality at most n it holds 

n mjl #(i\N)> 1 -. 

Proof. Clearly, for any fixed k e {1, . . . , m} we have n m ,i{k e I) = Z/m, thus 

2' 



E m ,,#(J\iV) = £ E^Cr) = #(iV c )l > / 

r£N c 



where we denoted by N c the complement of N. 

Proof of TheoremUi Given the above prior let us denote 



□ 



(9) 



eriMty) := mf (E m ,,E e \S(f, g) - g(f, g)\ 2 ) 



2\V2 



where the inf is taken with respect to any (possibly adaptive) deterministic algorithm 
which uses at most n values from / and g. 

For any Monte Carlo method S n we have, using Bahvalov's argument [4], the 
relation 

(io) e(s n ,T C m>< vg (^cm- 

We provide a lower bound for e^ g (J r c{^)) 2 ■ To this end note that for each realization 
{fwiQw) the integral J g u dfi is constant. In the first case m > C — 1, and we can 
bound the integral by the choice of I as 



111 



c m ,i := / Q w {x) fj,(dx) = —{IC + (m - 1)1) < 3. 



m 



In the other case m < C — 1, we obtain c TO> i = (C — 1 + m)/m. Now, to analyze the 
average case error, let q n be any (deterministic) method, and let us assume that it 
uses the set N of nodes. We have the decomposition 



S(fu,, Qu) ~ <ln{fu, Qu 



C 



mc m i 



£ i 



— — Y] £j - Qnifu,, Qu>) ) 
mc m i ' / 



jei\N 

Given J, the random variables in the brackets are conditionally independent, thus 
uncorrelated. Hence we conclude that 

2 



E m ,iE e \ S(fu, Qw) — Qn(fu>, Qu 



> E mj ;E e 

c 2 



c 



m,t jei\N 



m2c2 m,l 



E m ,,#(J\ iV) > 



CH 
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by Lemma [TJ In the case m > C — 1 we obtain I > m/C and have c m j < 3, such 
that 

,2- C 



VmAStf, Q)-q n (f, Q)Y> 



36n ' 



which in turn yields the first case bound in (jSJ). In the other case m < C 
value of I = 1 yields the second bound in flS}. 



1 the 
□ 



3.2. The error of the simple Monte Carlo method. The direct approach to 
evaluate fll]) would be to use the method S^ imple from We will prove an upper 
bound for the error of this method, and we start with the following 

Lemma 2. If the function g obeys the requirements in J-ci^t), then 

(1) < inf^n g{x) < sup xen q(x) < oo. 

(2) For every probability measure /x on Q, we have \[q\[2,h < V^H^Hi,^- 

Proof. To prove the first assertion, fix any y G O. Then the assumption on p yields 
£>(:r) < Cg(yo), and reversing the roles of x and y also the lower bound. Now both, 
the assumption on g as well as the second assertion, are invariant with respect to 
multiplication of g by a constant. In the light of the first assertion we may and do 
assume that 1 < g(x) < C, ifO, and we derive, using 1 < J n g(x) fj,(dx), that 

2 



□ 



g (x) n{dx) < C g(x) jj(dx) < C I / g(x) jj(dx) 
n Jn \Jn 

completing the proof of the second assertion and of the lemma. 

We turn to the bound for the simple Monte Carlo method. 
Theorem 2. For all n G N we have 



(12) 



e(Sr^,F c m < 2min |l, . 



Proof. The upper bound 2 is trivial, it even holds deterministically. Fix any pair 
(/, g) of input. For any sample (Xi, . . . ,X n ) and function g we denote the sample 
mean by ^ oan (#) := l/n^ =l g(Xj). It is well known that e(S~,g) < \\g\\ 2 /Vn- 
With this notation we can bound 



\S(f,g)-Sr ple (f,Q)\ < 
1 



Cmean 



< 



< 



l£l|i 
1 



S(f, g) 

f(x)g(x)fi(dx)-S~(fQ) 



(fe) 



J g(x)fi(dx) 



Cmean 
°n 



ifo) 



Cmean 



(Jq) 



WQWi 

where we used I S, 



f(x)g(x)/j,(dx) 



Cmean 



iff!) 



+ 
+ 



J g(x)fj,(dx) 

(fe) 



Cmean 



Cmean 



(q) 



g(x)fj,(dx) 



Cmean 



(q) 



g(x)fj,(dx) 



Cmean 



(q) 



mean 
n 



(fg)/S—(g)\ < 



which holds true since the enumerator 



and denominator use the same sample. This yields the following error bound 



e(Sr ple ,(Z, g))<^(e(S~,fg) 



< 



\Q\\i 
V2 



|^||iV n 



(WfQh + 



\Q\\2 



e(S~,Q)) 

2V2||/||c 



< 



\Qh 



< 



2V2C 



n 



'n 
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where we use Lemma[2j Taking the supremum over (/, g) G Tc{&) allows to complete 



In this section we impose restrictions on the input data, in particular on the 
density, in order to improve the complexity. This class is still large enough to contain 
many important situations. Monte Carlo methods for problems when the target 
(invariant) distribution is log-concave proved to be important in many studies, we 
refer to [10]. One of the main intrinsic features of such classes of distributions are 
isoperimetric inequalities, see [2j [13], which will also be used here in the form as 
used in [29j. Recall that here we always require that Q C M. d is a convex body, as 
introduced in Section [2j 

We start with a lower bound for all non-adaptive algorithms to exhibit that simple 
Monte Carlo cannot take into account the additional structure of the underlying 
class of input data and adaptive methods should be used. This bound, together 
with Theorem will show that adaptive methods can outperform any non-adaptive 
method, if we consider S on J ra (B d ). Indeed, we also show that specific Metropolis 
algorithms, based on local underlying Markov chains are suited for this problem 
class. 

4.1. A lower bound for non-adaptive methods. Here we prove a lower bound 
for all non-adaptive methods (hence in particular for the simple Monte Carlo method) 
for the problem on the classes J ra (Q). Again, this lower bound will use Bahvalov's 
technique. 

We start with a result on sphere packings. The Minkowski-Hlawka theorem, 
see [25J, says that the density of the densest sphere packing in M. d ist at least 
((d) ■ 2 1 ~ d > 2 1 ~ d . It is also known, see [UJ, that the density (by definition of 
the whole R d ) can be replaced by the density within a convex body f2, as long as 
the radius r of the spheres tends to zero. Hence we obtain the following result. 

Lemma 3. There isnu G N such that for all m > uq there are points y%, . . . , y m G 
such that with 



the closed balls B{ := B(jji,r) C Q are disjoint. 

Our construction will use such points yi, . . . , y m G Q and the corresponding balls 
Bi, ... , B m as follows. 

For i G {1, . . . , m} we assign 

Qi{y) ■= Qexp (-a\\y - yi\\ 2 ) , yeQ and 
fi(y) ■= CiXB t (y)> yefi, 
with constants q and C; chosen such that 



the proof. 



□ 



4. Analysis for F a (£l) 
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The corresponding values of the mapping S are computed as 
S(fi,Qi)= / fiQidy = Cid / exp(-a\\y - yi\\) dy 



Bi 

1/2 / r \ 1/2 



(13) = ( Ci / exp (~ a ^ y ~ yi ^ dy ) = ( Q / ex p(- a \\y\\) dy^J 



/s(o,r) ex p(- a ll2/ll)^ x ' " 

f n exp(-a\\y - yi\\) dy ^ 

Again we turn to the average case setting, this time with probability measure fi 2n 
being the equidistribution on the set 

T 2n := {{etfi, Q l ), i = 1, • • • , 2n, e t = ±1} C F a (Q). 

Similar to (JTOj) we have for any non-adaptive Monte Carlo method S n (f, g) the 
relation 

e(S n ,J ra (n)) > min {e av9 (q n , (i 2n ), q n is deterministic and non-adaptive} , 

where e av9 (q n , fi 2n ) denotes the average case error of the deterministic non-adaptive 
method q n with respect to the probability /i 2n . Thus let q n be any non- adaptive 
(deterministic) algorithm for S on the class J-" a (Q) that uses at most n values. 
The average case error can then be bounded from below as 

2n 



1 

E„2n \S{f, g) -q n {f, g)\ 2 = — ^2B £ \S(€ifi,gi) - q n (sifi, gi)' 2 

i=l 



> -min i= i ) ... ) 2nE e |5'(e i / i ,ft)| 2 > -min i=1( ... )2ri -S'(/ i , ft) 2 . 

Above, E e denotes the expecation with respect to the independent random variables 
£j = ±1. Together with ffTB"]) we obtain 

1/2 

1 K . I /s(0,r) eX P( -a l^ll)^ 



e(S n ,F a (n)) > -V2mm i=h 



2 1 ""' n \L ex p(- a \\y-yi\\)dy / 

We bound the enumerator from below and the denominator from above. For ar < 
log 2 we can bound 

/ exp(-a\\y\\)dy> Jvol(£(0,r)) = \r d vo\(B d ). 

JB(0,r) 1 1 

For the denominator we have 

/ exp(-a\\y - yi\\) dy < / exp(-a\\y - yi\\) dy 
Jn JR d 

= a~ d [ exp(-\\y\\) dy = a- d T(d)vo\dB d , 

such that we finally obtain, using the well known formula vo\(dB d ) = dvol(B d ), 
that 
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Using the value for r = r(Q, 2n) from Lemma [3] we end up with 

Theorem 3. Assume that S n is any non-adaptive Monte Carlo method for the class 
T a (p). Then, with uq from Lemma\^ we have for all 

2n > max inn, (a/log 4) d • ™^g d 



1/2 a d ' 2 



1/2 



that 

(14) e (^<n))>2^.g«) ~n- 

Remark 5. For fixed d this is a lower bound of the form e(S n ) > c^a d ^ 2 n^ 1 ^ 2 . 
It is interesting only if a is "large", otherwise the already mentioned lower bound 
(1 + y/n) _1 is better. 

We stress that in the above reasoning we essentially used the non-adaptivity of 
the method S n . Indeed, if S n were adaptive, then by just one appropriate function 
value g(x), we could identify the index i, since the functions Qi are global. Then, 
knowing i, we could ask for the value of £j and would obtain the exact solution to 
S(f,g) for this small class T 2n for all n > 2. 

4.2. Metropolis method with local underlying walk. The Metropolis algo- 
rithm we consider here has a specific routine Step in Figure [IJ whereas the final 
step Compute is exactly as given in (jSJ). It is based on a specific ball walk and this 
version is sometimes called ball walk with Metropolis filter, see [29J. Two concepts 
from the theory of Markov chains turn out to be important, reversibility and uni- 
form ergodicity. We recall these notions briefly, see [21] for further details. A Markov 
chain (K, it) is reversible with respect to it, if for all measurable subsets A,B G Q 
the balance 



(15) / K(x,B)n(dx) = / K(x,A)n(dx) 

J A JB 

holds true. Notice that in this case necessarily 7r is an invariant distribution. 

A Markov chain is uniformly ergodic if there are no G N, a constant c > and a 
probability measure v on Q such that 

(16) K no (x, A) > cv(A), for all A C SI and x E O. 

Markov chains which are uniformly ergodic have a unique invariant probability dis- 
tribution. 

Our analysis will be based on conductance arguments and we recall the basic no- 
tions, see [12l[T6] . If (K, n) is a Markov chain with transition kernel K and invariant 
distribution 7r then we assign the 

(1) local conductance at x G Q by Ik(x) := K(x, fl \ {x}), 

(2) and the conductance as 

f, K(x, A c )7r(dx) 

K ' rv ' o<7r(yi)<i mm{7r(A),7r(A c )} 

where A c = Q \ A. 

Below we call I > a lower bound for the local conductance, if Ir{x) > I for all 
X G O. 
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The ball walk and some of its properties. Here we gather some properties of the 
ball walk, see [TBI |2"U] , which will serve as ingredients for the analysis of Metropolis 
chains using this as the underlying proposal. In particular we prove that on convex 
bodies in M. d the ball walk is uniformly ergodic and we bound its conductance from 
below, in terms of bounds I > for the local conductance. 

We abbreviate -8(0, 5) = 5B d . Let Qs be the transition kernel of a local random 
walk having transitions within 5-balls of its current position, i.e., we let 

and 

(19) Q 5 {x,A):=l vohW) ' T ' 

[Qs(x,A \ {x}) + Q s (x, {x}), Ac £1 and x G A. 

Schematically, the transition kernel may be viewed as in Figure [21 



Procedure Ball-walk-step (x, 5) 
Input : current position x; 5 > 0; 
Output: next position; 
Propose: Choose y G B(x, 5) uniformly; 
Accept: if y G Q then 

return y; 
else 

return x; 
end 



Figure 2. Schematic view of ball walk step 

Clearly we may restrict to 8 < D, the diameter of Q. The following observation 
is important and explains why we restrict ourselves to convex bodies.. 

Lemma 4. If Q C M d is a convex body, then the ball walk Qs has a (non-trivial) 
lower bound I > for the local conductance. 

Proof. It is well-known that convex bodies satisfy the cone condition (see 0, § 3.2, 
Lemma 3]). Therefore we obtain that for each 5 > there is I > such that for each 
xeOwe have Iq s {x) > I. □ 

Remark 6. Observe however, that I might be very small. For Q = [0, l] d , for 
example, we get I = 2~ d , even if 5 is very small. In contrast, we will see that a large 
I is possible for Vt = B d and 5 < 1/ y/d+T, see Lemma [71 

Notice that Iq s (x) = vol(B(x, 5) PI Q)/vo\(8B d ), hence in the following we use the 
inequality 

(20) vo\(B(x, 5)nn)>l vol^), 

where I > is a lower bound for the local conductance of the ball walk. 

The following result is folklore, but for a lack of reference we sketch a proof. 
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Proposition 1. The ball walk Qs is reversible with respect to the uniform distribu- 
tion /xq and uniformly ergodic. 

The crucial tool for proving this is provided by the notion of small and petite 
sets, where we refer to |T9|, Sect. 5.2 & 5.5] for details and properties. To this end we 
introduce a sampled chain, say (Qs) a , where a is some probability a = (ao, a\, . . .) 
on {0, 1, 2, ... } and (Q s ) a is defined by (Qs) a (x, C) := Y^'jLo a jQi( x > C). We recall 
that a (measurable) subset C C fl is petite (for Qs), if there are a probability a and 
a probability measure v on fl such that 

(21) (Qs) a (y, A) > eu(A), Acfl, ye C. 

A set C C is small, if the same property holds true for some Dirac probability 
a := 5 n , such that obviously small sets are petite. We first show that certain balls 
are small. 

Lemma 5. The sets B(x, 8/2) fl fl, i6ll are small for Q$. 

Proof. First, we note that y E B(x,8/2) implies B(x,5/2) C B(y,5). Let I > be 
a lower bound for the local conductance of Qs/i- Using (1201) for Qs/2, we obtain for 
any set A C fl that 

n < A\~>n( A\S \\ ™KB(y,5)nA) d vol(B(x, 5/2) n A) 
Qs(y,A)>Qs(y,A\{y}) = — m -^ r >2 vol{6/2Bd) 

> 2 _ d vo\(AnB(x,s/2)nn) 

vol(B(x,5/2)nfl) 
Hence estimate ( 1211) holds true with n := 1, e := I ■ 2~ d and 

v(A) - ^nB(x,g/2)nn) 

vol(S(a;,(5/2)nn) ' 
This completes the proof. □ 

Proof of Proposition^ We first prove reversibility with respect to /xn- Notice that 
it is enough to verify (1T5|) for disjoint sets A,BcO. Furthermore we observe that 
for any pair A, B C fl of measurable subsets the characteristic function of the set 

{(x, y) G fl x fl, x e A, y E B, \\x — y\\ < 5} 

can equivalently be rewritten as 

XB(y)XB(y,8)nA(x) OT XA(x)XB(x,S)nB{y) ■ 

Hence, letting temporarily c := vol(fi) vol(5B ) we obtain 



Q 5 (x, B) nn{dx) = - I vol(B(x, 5) n £) rfx 

Xji(aj)Xfl(M)nB(!/) dj/ ^ 




1 

1 

c 




XB(z/)Xs(y,5)nA(a;) dxdy = I Q s (y,A) Hu(dy), 



proving reversibility. 

By Lemma [5] each set B(x,5/2) (1 fl is small, thus also petite. Petiteness is in- 
herited by taking finite unions. Since fl, being compact, can be covered by finitely 
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many sets B(x,5/2) R £1, this implies that Q is petite. By [T9l Thm. 16.2.2] this 
yields uniform ergodicity of the ball walk (see [TjJl Thm. 16.0.2(v)]). □ 

We mention the following conductance bound of the ball walk, which is a slight 
improvement of [29l Thm. 5.2]. This will be a special case of Theorem HJ below, and 
we omit the proof. 

Proposition 2. Let (Qs,^n) be the ball walk from above, and let (p{Q$,Hn) be its 
conductance. Let D be the diameter of Q and let I be a lower bound for the local 
conductance. Then 

(22) ip(Q s ^ a )>J-- 



2 8DVd + T 

The local conductance may be arbitrarily small if the domain Q has sharp corners. 
For specific sets Q we can explicitly provide lower bounds for the local conductance, 
and this will be used in the later convergence analysis. In the following we mainly 
discuss the case fl = B d . 

We start with a technical result, related to the Gamma function on M + . We use 
the well-known formula 

(23) vo\(B d ) = n d / 2 /T(d/2 + 1). 

Lemma 6. For any z > we have 

Consequently, 

/ \ vo\(B d - v , 
(25) — W < 



vo\(B d ) ~ V 2tt 

Proof. By |8] Chapt. VII, Eq. (11)] we know that the function z \— > logT(z) is convex 
for z > 0. Thus we conclude 

\ogT(z + 1/2) < 1 (\ogT(z + 1) + io g r(z)) 

= l - (log z + 2 log T{z)) = log + log T{z), 

from which the proof of assertion (|24|) can be completed. Using the representation 
for the volume from f)23p and applying the above bound with z := (d + l)/2 we 
obtain 

vol^" 1 ) T(d/2 + 1) fd+1 
vo\{B d ) ~ y/WT((d + l)/2) ~ V 2tt ' 
and the proof is complete. □ 

Using Lemma [61 we can prove the following lower bound for the local conductance 
of the ball walk on B d . 



Lemma 7. Let (Qs,Hn) be the local ball walk on B d C M d . If 5 < 1/y/d + 1, then 
its local conductance obeys I > 0.3. 
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Proof. The proof is based on some geometric reasoning. It is clear that the local 
conductance l(x) is minimal for points x at the boundary of B d , and in this case 
its value equals the portion, say V, of the volume of B(x, S) inside B d . If H is the 
hyperplane at x to B d , then this cuts off B(x, 5) exactly one half of its volume. 
Thus we let Z(h) be the cylinder with base being the (d — l)-ball around x in 
the hyperplane H of radius 5. Its height h is the distance of H to the hyperplane 
determined by the intersection of B d n B(x, 5). This height h is exactly determined 
from the quotient h/S = 5/2, by similarity, hence h := 5 2 /2. By construction we have 
V > 1/2 — vo\(Z(h))/ vo\(B(x,5)) and we can lower bound the local conductance 
l(x) by 



2 vol{B(x,S)) 

We can evaluate vo\(Z(h)) as vo\(Z(h)) = hd^ 1 vol(S ), and we obtain 

1 5 d+1 vo\(B d - 1 ) _ 1 / 5vo\(B d - 1 ) 
^ - 2 ~ 2^vol(S rf ) ~ 2 V 1 ~ vol(5 d ) 
The bound (1251) from Lemma [6] implies 



1A *^+T 



2 V V/27T 



For 5 < l/(Vd+ 1) we get > 1/2(1 - 1/V27r) > 0.3, completing the proof. □ 

We close this subsection with the following technical lemma, which can be ex- 
tracted from the unpublished seminar note [28]. For the convenience of the reader 
we present its proof. In addition we will slightly improve the statement. 

Lemma 8. Let I > be a lower bound for the local conductance of the ball walk 
(Qg, /in)- For any < t < I and any set A G Q with related sets 

(26) At := jx e A, Q s {x, A c ) < ^ j C A 

~2 



(27) A 2 :={yeA c , Qs(y, A) < k# 



we have d(A 1 ,A 2 ) > t5 v / 2ir/ (d+1). 

For its proof we need the following 

Lemma 9. Let 5 > 0. If x,y G M d are two points with distance t5^y2n/ {d + 1) at 
most, then 

(28) vol(B(x, 5) n 5)) > (1 - *) vol(5fi d ). 

Proof. Let m := ||x — 2/ 1 1 2- If u < S then the volume of the intersection of B(x,5) 
and B(y, 5) is exactly the same as the volume of the ball 5B d minus the volume of 
the middle slice with distance u as thickness. The volume of this slice is bounded 
from above by the volume of the cylinder with base SB d ~ l and thickness u. Thus 
we obtain 

vo\(B(x,5) n B(y,S)) > vo\(5B d ) -uvo^B^ 1 ) = vo\(5B d ) (l - u ^^-J^ ] . 

\ vo\(oB d ) J 
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Applying Lemma [6] we obtain 

vol(5B d - 1 ) _ vol^- 1 ) 1 jd+ 1 
vo\{5B d ) ~ 5vo\(B d ) ~ 5\ 2vr ' 

thus by the choice of u < \^2nt5 j\fd + 1 we conclude that 

vol^B" 1 - 1 ) V^t5VdTT 
U vo\(5B d ) ~ 5^^fd + i ~ ' 
and the proof is complete. □ 

We turn to the 

Proof of LemmalB Let x G A\ and y G A 2 be in f2, and suppose that their distance 
is at most tb^/lTtj (d + 1). Simple set theoretic reasoning shows that 

vol(B(x, 5) n 5) n fl) > vol(B(ar, 8)nfi)- vo\(B(x, 5) \ B(y, 5)) 

> vo\(B(x, i)nn)- vol(B(a;, 5) \ (S(a;, 5) n B(y, 5))) 
= vol(B(x 7 6)nn)- vo\(5B d ) + vol(B(x, 5) n 5)). 
Since Z is a lower bound for the conductance l(x) we have that 

vol(B(x,6) HO) > Zvol(B(a;,<J)) = Zvol(55 d ). 
Taking this into account and using ([28]) we end up with 

vol(fl(x, 5) n B(y, 5)nn)>l vo\(5B d ) - vo\(5B d ) + (1 - t) vo\(5B d ) 

= (l-t)vol(5B d ). 

In probabilistic terms this rewrites as Qs(x, B(x, 5) fl B(y,5) PI ft) > I — t, and 
similarly Qs(y, B(x,5) fl B(y,S) HQ) > I — t. Now, if A C VL is any measurable 
subset with complement A c then for x E A and ?/ e i c we obtain 

B(ar, 5) n B(y, 5) n ft C 5) n A c n fi) (J 5)nAnO), 

which in turn yields Qs(x, A c ) + (^(j/, A) > I —t, but this contradicts the definition 
of the sets Ai and A 2 . Hence any two points from Ai and A 2 , respectively, must 
have distance larger than t5^2ir/ (d + 1), and the proof is complete. □ 

Properties of the related Metropolis method. We analyze Metropolis Markov chains 
which are based on the ball walk, introduced above, for some appropriately chosen S. 
As it will turn out, the related Metropolis chains are perturbations of the underlying 
ball walk, and its properties, as established in Propositions [1] and [2] extend in a 
natural way. 

For q G 7Z a (Q) we define the acceptance probabilities as 

(29) 0(x,y):=min(l,44V 
The corresponding Metropolis kernel is given by 

(30) K e! s(x,dy) := 6(x,y)Q s (x,dy) + (1 - 9(x,y)Q s (x,dy))5 x (dy). 
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Note that for x ^ A we obtain 

K etS (x,A)= I 9(x,y)Qs(x,dy) = [ 6(x,y)dy. 

J a vo\{dB d ) J AnB ( x ,s) 

Below we sketch a single Metropolis Step from the present position x G f2 with 
kernel K Bj s(x, ■). The procedure Ball-walk-step was described in Figure [2j 

Procedure Metropolis-step (x, g, 5) 
Input : current position x, 5 > 0, function g; 
Output: next position; 
Propose: y : = Ball- walk-step (s, 5); 
Accept: 

if Q{y) > q{ x ) then 
[ return y 

else if g(y) > rand() • g(x) then 

| return y 
else 

I return x 
end 



Figure 3. Schematic view of the Metropolis step. Note that the Ac- 
ceptance step results in an acceptance probability of 9(x,y) = 
min{l,£(?/)/£(x)}. 

We start with the following observation. 

Lemma 10. Let a be the Lipschitz constant in TZ a (Q) and (3 := exp(— a5). Uni- 
formly for g G TZ a (Q) the following bound for the related Metropolis chain holds 
true: 

(31) K e A x i d v) ^ PQs(x, dy). 

Proof. Let A C tt. If dist(a;,y4) > 5 then there is nothing to prove. Otherwise, for 
y e An B(x, 5) we find from © and ([29} that 

y) > exp(— a\\x — y\\ 2 ) > e~ aS = f3. 

By definition of the transition kernel K e ^ from (1301) we can use (3 to bound 

K e , s (x, A) > min {9(x, y), y G An B(x, 5)} Q s (x, A) > (3Q s (x, A). 

The proof is complete. □ 

The assertion of Proposition [1] extends to the family of Metropolis chains as 
follows. 

Proposition 3 (cf. [T8| Prop. 1]). Let Q$ be the ball walk from / TJPj) on Q. For each 
g G lZ a {VL) and 5 < D the corresponding Metropolis chains from l[30}) are uniformly 
ergodic and reversible with respect to the related fi g . 

Proof. Reversibility with respect to \i e is clear by the choice of the function 9. To 
prove uniform ergodicity, let (3 be from Lemma [TOl and c from ffl6|) . As established in 
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Lemma [10] we have K Bj s(x,dy) > j3Qs{x,dy). It is easy to see, and was established 
in [T81 Proof of Thm. 2], that this extends to all iterates as 

Kl s (x,dy)>(3 n Q^{x,dy). 

Recall that under the assumptions made, the ball walk is uniformly ergodic, and 
from Proposition [JJ we obtain no such that for all x G Q we have 

(32) K2 5 (x,A)>(3 no cu(A), 4cfi, 

proving uniform ergodicity. □ 

Remark 7. Notice that (1321) is obtained with right hand side uniformly for all 
g G TZ a (fl), a fact which will prove useful later. 

Finally we prove lower bounds for the conductance of the Metropolis chains. 

Theorem 4. Let (K Qi s,/J> g ) be the Metropolis chain based on the local ball walk 
(Qsj^n) and let ip(K Bj s, H e ) be its conductance, where g G lZ a (Q). Let I be a lower 
bound for the local conductance ofQs- For g G 7Z a (Q) we have 

(33) <P(K^ 8 ) > V min { V^^TI' *} ' 
where D is the diameter of CI. 

Remark 8. As mentioned above, Proposition [2] is a special case of Theorem H] for 
a = 0. 

The proof of Theorem H] will be based on Lemma M for the underlying ball walk, 
specifying t := 1/2. This extends to the Metropolis walk as follows. 

Lemma 11. Let a from (0|) and I be the local conductance of the ball walk. We let 
(3 := exp(— ad). For A C Q we assign 

(34) Tf.= LeA, K^ 5 (x,A c )<^\cA 

(35) T 2 :={yeA c , K e<s (y, A) < C A c . 
Then d(T 1 ,T 2 ) > Sly/ir/ (2d + 2). 



Proof. It is enough to prove T\ C A\ and T 2 C A%. If x G T\ then Lemma fTUl implies 

K g>s {x,A c ) < (31/4, hence 

Qs(x,A c )<^K g 4x,A c )< 1 -. 

The other inclusion is proved similarly. □ 
We turn to the 

Proof of Theorem ^ Let A C Q be the set for which the conductance is attained. 
We assign sets 7\ and T 2 as in Lemma [Til and distinguish two cases. If fi g (Ti) < 
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fjb e (A)/2 or H e (T 2 ) < fi g (A c )/2, then the estimate (1331) follows easily. For instance, 
if N {T X ) < fi e (A)/2 then 



/ K e>s (x, A c )fi e (dx) > / K e<s {x, A c )^ g (dx) 

J A JA\T! 



> ^ e (A\Ti) > j^(A) > f minK(A),^(A c )}, 

thus cp(K g) g, fjL e ) > pi/8 in this case, which proves fl33|) . 

Otherwise we have fi g (Ti) > /j ig (A)/2 and /i (? (T 2 ) > {i g (A c )/2. In this case we 
apply an isoperimetric inequality, see [291 Thm. 4.2] to the triple (Ti,T 2 ,T 3 ) with 
T 3 := Q\ (Ti U T 2 ) to conclude that 

(36) /i e (T s ) > ^^ rnin^Tx),^)}, 
hence under the size constraints in this case it holds true that 

(37) N {T,) > ^j^min {^(A), ^(A c )} . 
Using the reversibility of the Metropolis chain (K g ^,fi g ) we have 

K e<s (x, A c )n e (dx) = / K g>5 (y,A)ii g (dy), 



A 



which implies 



1 



a 2 \j A JA 



K ejS (x, A c )fi g (dx) = - / K e , s (x, A c )fi e (dx) + / K eiS {y,A)fi e (dy) 



>-[ I K g j(x,A c )fi g (dx) + I K g)S (y,A)n g {dy) 
'AnT 3 J A c nT 3 



2 



> \ (^ N {A n r 3 ) + ^M^ c n t 3 ; 
= f (^(x n t 3 ) + ^ nr s )) = ^(t 3 ). 



Since by Lemma fTTl we can bound d(Ti,T 2 ) > 51^/tt/ (2d + 2) we use ( 1371) to com- 
plete the proof. □ 

If we restrict ourselves to Metropolis chains on B d , then Lemma [7J provides a 
lower bound for the local conductance which is independent of the dimension d. As 
a simple consequence of Theorem H] we then obtain the following 

Corollary 1. Assume that g e lZ a (B d ) and 5 < (d + l) -1 / 2 . JTien we obtain 

if(K p5 ,fi p ) > *P- % e~ aS . 

y V ejiw - y 2 1600v ^TT 

To maximize tp we define 5* = min \l/\/d + 1, 1/a} and obtain 
(p(K etS *,fjt e ) > 0.0025 -p^min 1 j 



Vc? + 1 I Vd + 1 ' a J ' 
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Error bounds. For the class T a {VL) the above lower conductance bound ( |33l) will 
yield an error estimate for the problem (J2J). 

Let be the estimator based on a sample of the local Metropolis Markov chain 
with transition K e ,s, starting at zero. To estimate its error we combine the estimates 
of the conductance of K g g with two results, partially known from the literature. To 
formulate the results we note the following. The Markov kernel K Bj s is reversible 
with respect to \x e and hence induces a self-adjoint operator 

K 6jS : L 2 {Q,fi e ) -> L 2 (Q,/i g ). 

The spectrum a(K Q ^) is contained in [—1, 1] and 1 G o~(K gj g) and we are interested 
in the second largest eigenvalue 

P g>s := sup{cr G <r{K 8t s) \ o ^ 1} 

of K g $. This is motivated by the extension of a result from [TSJ Cor. 1] about the 
worst case error of uniformly for (/, g) G 

Lemma 12. 

lim sup e(S 5 n ,(f,g)) 2 -n= sup * + ^ . 
n ~^°° (/,£>)e.F Q (Q) Q eiz a (n) 1 — Pg,8 

The proof is given in the appendix. For Markov chains which start according to 
the invariant distribution \L g the bound is similar, but more explicit and was given 
in [26] and pH Thm. 1.9]. 

The relation of the second largest eigenvalue (3 e j to the conductance is given in 

Lemma 13 (Cheeger's Inequality, see [121 [151 [16]). 

X gjS := 1 - (3 g>s > if 2 {K e ^^ e )/2. 

We are ready to state our main result for the Metropolis algorithm S^, based on 
the Markov chain K gj s, for the class !F a (B d ), i.e., when O C K d is the Euclidean 
unit ball. 

Theorem 5. Let = i Y^=i f(-^-j) ^ e the estimator based on a sample (Xi, . . . , X n ) 
of the local Metropolis Markov chain with transition K g> s, where 5 < (d + l) -1 / 2 . 
Then 



8 • 1600 2 , , , e 2aS 
— — d + 1 -. 

n^oo rt ^^Tarod\ Ol7T 



(38) lim sup e(S d n} (f } g)y -n < — (d + 1) 
" 1 (/,e)e^(s d ) 

Again we may choose S* = min{(d+ l) _1//2 ,a -1 } and obtain 

(39) lim sup e(Sf, (/,£)) 2 -n< 594700- max a 2 }. 

n_ *°° (f,g)e^(Bd) 

Proof. This follows from Corollary [I] and Lemmas [12] and [13l □ 

5. Summary 

Let us discuss our findings. The results from Section [3] clearly indicate that the su- 
periority of Metropolis algorithms upon simpler (non-adaptive) Monte Carlo meth- 
ods does not hold in general. Specifically, it does not hold for the large classes TciS^) 
of input without additional structure. 
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On the other hand, for the class J ra (B d ), specific Metropolis algorithms that are 
based on local underlying walks are superior to all non-adaptive methods. Even more, 
on B d the cost of the algorithm , roughly given by the number n of evaluations 
of q and /, increases like a polynomial in d and a. More precisely, according to 
the asymptotic constant Ymv n ^ rxi e{S^J za {B d )) 2 ■ n is bounded by a constant 
times max {d 2 , da 2 }, i.e., the complexity grows polynomially in d and a and, for 
fixed d, increases (at most) as a 2 . If we only allow non-adaptive methods then this 
asymptotic constant, again for fixed d, increases at least as a d , see (1141) . 

We believe that this problem is tractable in the sense that the number of function 
values to achieve an error e can be bounded by 

(40) n{e,F a (B d )) < Ce~ 2 dm&x{d,a 2 ). 

We did not prove ([417]) . however, since Theorem 5 is only a statement for large n. 

Notice that according to Theorem the size S* of the underlying balls walk needs 
to be adjusted both to the spatial dimension d and the Lipschitz constant a. 

The analysis of the Metropolis algorithm is based on properties of the underlying 
ball walk; in particular we establish uniform ergodicity of the ball walk for convex 
bodies fl C M. d . Also, based on conductance arguments, we provide lower bounds for 
the spectral gap of the ball walk. 

As a consequence, in the case a = the estimate (1551) provides an error bound 
for the ball walk (Qs,fi), which is asymptotically of the form e(S^, L 2 (B d , //)) < 
Cd-^d/n) 1 ' 2 . 

The results extend in a similar way to any family C M d for which the un- 
derlying local ball walk Qs has (for 5 < 5^) a non-trivial lower bound for the local 
conductance that is independent of the dimension. 

Finally, from the results of Section [3] we can conclude that adaption does not 
help much for the classes JF C (£7). Hence we have new results concerning the power 
of adaption, see [22] for a survey of earlier results, in particular that it may help to 
break the curse of dimensionality for the classes J ra (B d ). 

Appendix A. Proof of Lemma [T21 

Lemma [T2l extends the bound from [18| Thm. 1], which deals with a single uni- 
formly ergodic chain. It was obtained from on a contraction property, as stated 
in [T8| Prop. 1]. The goal of the present analysis is to establish this asymptotic re- 
sult uniformly for all Metropolis chains with density from TZ a (Q), by showing that 
this contractivity holds true uniformly. 

Contractivity of the Markov operator. We assign to each transition kernel K 
on Q with corresponding invariant distribution \i the bounded linear mapping P, 
given by 

(41) (Pf)(x):= j f(y)K(x,dy). 

Also we let E denote the mapping which assigns any integrable function its expecta- 
tion as a constant function E(f) : = J n f(x) n(dx). For each K the mapping P — E 
is bounded in L^VL, fi), with norm less than or equal to one and we shall strengthen 
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this uniformly for kernels K e> $ with g G lZ a (Q). Within this operator context uni- 
form ergodicity is equivalent to a specific form of quasi-compactness, namely there 
are < n < 1 and n$ G N for which 

(42) || P n - E: Loo{U) -> Loo(n)|| < V, for n > n . 

We first show that reversibility allows to transfer this to the spaces Li(Q,(A e ). 

Lemma 14. Suppose that the transition kernel K with corresponding mapping P is 
reversible. Then for all n G N we have 

(43) \\P n -E: Lx(tt, //) —> Li(Q, < \\P n -E: L^tt, fi) -> . 

Proof. If if is reversible, then so are all iterates if™. Thus for arbitrary functions 
/ G fi) and /i G Loo(^, aO we have, using the scalar product on L 2 (fl, fi), that 

((P n -E)f,h) = (f, (P n -E)h). 

Consequently, for any / G Li(0, /i) we have 

IKP^-^/Hx^ sup |<(p"-p)M)| = sup |(/,(P" -£)/>>! 

||ft||oo<l ||h||oo<l 

<||/||x sup \\(P n - E)h\U 

||/i||oo<l 

from which the proof can be completed. □ 

Proposition 4. For any convex body flcM^ there are an integer n Q and a constant 
< n < 1 such that uniformly for g G 7Z a (Q) we have 

(44) ||P$ - P: LxCn./i,) - M^WII < 77. 

Proof. This is an immediate consequence of the bound (1321 . As mentioned in Re- 
mark [7] uniform ergodicity was established uniformly for g G TZ a (Q). It is well known 
(see [I~9l Thm. 16.2.4]) that this implies that there is an rj < 1 such that uniformly 
for g G TZ a (Q) we have 

(45) \\P™ - P: ^(fi) -> ^(^11 < V , for n > n . 

In the light of Lemma [141 this yields fj44|) . □ 

Finally we sketch the 

Proof of Lemma [7^ Using Proposition H] we can extend the proof of [TSl Thm. 1] . 
In particular, the bounds from Eq. (13)— (15) in [IS] tend to zero uniformly for 
q G TZ a (Q). Moreover, starting at zero, after one step according to the underlying 
ball walk, the (new) initial distribution is uniformly bounded with respect to the 
uniform distribution on Q, hence also with respect to fj, g , such that we establish the 
asymptotics in Lemma [T2l □ 
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